transition dynamic
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Dominican Republic (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China (0.04)
DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
Adapting to the changes in transition dynamics is essential in robotic applications. By learning a conditional policy with a compact context, context-aware meta-reinforcement learning provides a flexible way to adjust behavior according to dynamics changes. However, in real-world applications, the agent may encounter complex dynamics changes. Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making. This paper addresses such a challenge by decomposed mutual information optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories while minimizing the state transition prediction error. Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges via learning disentangled context and reduce the demand for the number of samples collected in various environments. Extensive experiments show that the context learned by DOMINO benefits both model-based and model-free reinforcement learning algorithms for dynamics generalization in terms of sample efficiency and performance in unseen environments.
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
We consider offline reinforcement learning (RL) with heterogeneous agents under severe data scarcity, i.e., we only observe a single historical trajectory for every agent under an unknown, potentially sub-optimal policy. We find that the performance of state-of-the-art offline and model-based RL methods degrade significantly given such limited data availability, even for commonly perceived solved benchmark settings such as MountainCar and CartPole. To address this challenge, we propose PerSim, a model-based offline RL approach which first learns a personalized simulator for each agent by collectively using the historical trajectories across all agents, prior to learning a policy. We do so by positing that the transition dynamics across agents can be represented as a latent function of latent factors associated with agents, states, and actions; subsequently, we theoretically establish that this function is well-approximated by a low-rank decomposition of separable agent, state, and action latent functions. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data. We perform extensive experiments across several benchmark environments and RL methods. The consistent improvement of our approach, measured in terms of both state dynamics prediction and eventual reward, confirms the efficacy of our framework in leveraging limited historical data to simultaneously learn personalized policies across agents.
- North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Transportation > Passenger (0.55)
- Transportation > Ground > Road (0.30)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.41)
- North America > United States > New Hampshire (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
- (2 more...)
- North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Transportation > Passenger (0.55)
- Transportation > Ground > Road (0.30)
- North America > United States > New Hampshire (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
- (2 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (7 more...)
- Leisure & Entertainment > Games (0.46)
- Education (0.46)